Speech 2 Speech
eleven labs
hume ai
recent speech language model
GLM-4-Voice
J-Moshi を試す
Moshi: a speech-text foundation model for real-time dialogue
Soundwave: Less is More for Speech-Text Alignment in LLMs
Crossing the uncanny valley of conversational voice
Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
ターン検出のsmart-turnでリアルタイムで発話中かどうかを判定する
MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Thai Semantic End-of-Turn Detection for Real-Time Voice Agents